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Abstract 

The Hamming oracle returns the Hamming distance between an unknown binary n-vector x and 
a binary query n-vector y. The objective is to determine x uniquely using a sequence of m queries. 
What are the minimum number of queries required in the worst case? We consider the query ratio 
m/n to be our figure of merit and derive upper bounds on the query ratio by explicitly constructing 
{m, n) query matrices. We show that our recursive and algebraic construction results in query ratios 
arbitrarily close to zero. Our construction is based on codes of constant weight. A decoding algorithm 
for recovering the unknown binary vector is also described. 

I. Introduction 

An unknown binary vector x of length n bits is to be determined by posing a sequence of queries 
to an oracle. Each query takes the form of a binary vector y of length n bits. We consider two kinds 
of oracles, (i) the Hamming distance oracle that returns the Hamming distance y) between x and 
y, and (ii) the overlap oracle that returns the Hamming weight w{x.y) of the inner product between 
X and y. Our measure of efficiency is the number of queries required to determine x for the worst x 
and our objective is to determine a sequence of queries that minimizes the number of queries in the 
worst case. There are two cases of interest: (i) adaptive: the ith query is allowed to depend on all 
preceding queries and responses, and (ii) non-adaptive: the queries are formulated in advance. We 
only consider the non-adaptive case in this paper. 
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The binary set {0, 1} is treated as a subset of the integers. The weight of a binary vector, denoted 
w{x) counts the number of ones in x and the Hamming distance d{x, y) counts the number of positions 
in which x and y differ. The inner product is the standard Euclidean inner product Yll=i ^iVi- 

The overlap oracle is related to the group testing oracle [61. In group testing, an unknown binary 
n-vector x has Hamming weight w{x) < d for some given d < n. A query y is a binary n-vector and 
the oracle returns 1 if w{x.y) > and if w{x.y) = 0. The main difference between the problem 
studied here and the group testing problem is that (i) the group testing oracle is less informative, and 
(ii) we impose no restriction on the Hamming weight of the unknown vector x. Since the Hamming 
and overlap oracles are more informative than the group testing oracle, and we expect that fewer 
queries will suffice in order to determine x. Previous work on the Hamming oracle includes [i7j| and 
lfT4ll where it is observed that m < n queries suffice to determine x. The problem is closely related to 
the distinct subset sum problem, which is the problem of constructing sets of natural numbers such 
that the sum over any subset is unique. The reader is referred to papers by Conway and Guy flU, 
Guy [111, Lunnon [fT2l and Bohman [2J among others. 

The paper is organized as follows. Basic notation, and some examples are presented in Sec |llj A 
lower bound on the query ratio is proved in Sec. [In] using a packing argument. Also a previously 
known upper bound 111 11 . based on a probabilistic argument is stated. The discrete subset sum problem 



is described in Sec. IV, and a preliminary construction is given. Sec. M contains some relevant results 



on constant weight codes and block designs. Our basic construction is given in Sec. VI followed by 



an iterated construction in Sec. VII A decoding method is described in Sec. VIII achievable query 
ratios are derived in Sec. IX two example constructions are given in Sec |X] and conclusions are in 
Sec. El 



II. Preliminaries 

All vectors are column vectors. From the identity d{x, y) = w{x) + w{y) — 2w{x.y), it follows that 
if w(x) is known, then w{x.y) can be determined from d{x,y). The cost of determining w{x) is a 
single query — the all-ones query. Thus, if m queries to the overlap oracle are sufficient to determine 
X, then at most one extra query to a Hamming oracle would suffice to determine x. 

For the overlap oracle, x can be obtained as a solution to a system of linear equations 

Qx = UJ (1) 
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where Q is an (m, n) binary matrix the iih. row of which is the ith query vector, x is the unknown 
binary vector, u = (cui) is an m-vector of non-negative integers, coi = w{x.qi), where qi is the ith 
row of Q and multiplication is over the reals. 

If ([T]) has a unique solution for every x, then any non-zero vector in the null space of Q, AfiQ) 
cannot consist solely of entries from the set { — 1, 0, 1} for if it did, then two binary vectors xi and 
X2 would satisfy Qxi = Qx2 lEl, [fTTll . This leads to the following definition. 

Definition 1. Query matrix Q is said to be uniquely identifying (UI) if the only {—1,0,1} -valued 
vector in Af{Q) is the all-zero vector Equivalently, Q is UI if every disjoint pair of column subsets 
have unequal column sums. 

Definition 2. Query ratio p to said to be achievable if there exists a UI query matrix Q of size 
{m,n) with m/n < p. 

Clearly ii Q = I where / is the identity matrix, then Q is UI. Thus p < 1. It is interesting that p 
arbitrarily close to is achievable. 

Example 1. We claim that the (4, 5) matix 

( lllll\ 
01001 



Q 



00101 
\00011/ 



(2) 



is UI. Thus p < 0.8. 



The proof is accomplished by showing that there is no non-zero vector z with entries from 
{—1, 0, 1} which lies in Miff). A non-zero { — 1, 0, l}-valued vector z in Miff) identifies two disjoint 
subsets of columns that have identical sums, and in this case the size of each subset must be the 
same (since the top row is all ones). A subset cannot consist of a single column since all columns 
are distinct; if a subset consists of two columns, it cannot pair the last column with any other than 
the first column else one of the lower three positions would be two and cannot equal the sum of 
any of the remaining two columns. By a similar argument the first column must be paired with the 
last column. But then, with the first and last columns paired together, equal column sums cannot be 
achieved using any two of the middle three columns for position will always be zero. 
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In addition to proving that Q is UI, the above example also highlights the difficulties involved 
in scaling the case analysis to larger matrices. Clearly a more efficient method of proof is required. 
Also, note that the query matrix in the above example is valid for both the Hamming and overlap 
oracles. 

For query matrices with an all I's top row we can show that a 3 x 4 matrix does not exist, thus 
for n = 4, m = 3 queries are never enough for the Hamming oracle. However, three queries suffice 
for the overlap oracle as the following example shows. 



Example 2. The (3, 4) matrix Q 



f 1101^ 

1011 
yOllly 



is UI. 



If we append an all column to this Q and then place an all I's row on top we get a 4 x 5 
uniquely identifying matrix for the Hamming oracle. 

III. General Observations 
Using a packing argument we prove the following lower bound on query ratio p for a UI matrix. 

Theorem 1. For the Hamming oracle the query ratio must satisfy p = m/n > l/log2{n + 1) for any 
sequence of m queries that determine every n-bit x uniquely. 

Proof: For each x the Hamming oracle must result in a unique column m-vector of Hamming 
distances in order to guarantee that x be determined. But there are at most (n + l)"^ distinct Hamming 
distance vectors. Thus 2" < (n + 1)™ or p = m/n> 1/ log2(n + 1). ■ 
In a recent contribution yjj, it has been shown using the probabilistic method, that any 

p> (Iog29/log2n)(l + 0(n)) (3) 

is achievable, where (j){n) — )■ as — )■ oo. This was accomplished by showing that binary matrices 
with m rows and n columns exist, for which no disjoint sets of columns have equal sums. 

IV. Relation to the Discrete-Subset-Sum Problem and a Construction 

A subset of the positive integers, all of whose sums are distinct, is said to have distinct subset 
sums. We call such a set with n elements, an ra— DSS set. A well known example of a DSS set 
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is {1,2,4, .. . ,2""^}. If we denote the ith element of S* by rj, then for the exponential set r„ = 
(1/2)2". The objective is to construct n-DSS sets for which r{n) is small. The current record holding 
construction for an n-DSS set [i2l achieves r„ < 0.22002 • 2". 

The similarity of our problem to the n-DSS set problem comes from the observation that for an 
n-DSS set, the only { — 1,0, l}-valued vector y for which XliLi ''"iVi = is the all zero vector y = 0. 

We now present an elementary construction based on an n-DSS set. 

Construction 1. Given an n-DSS set S, set m = \\0g2 rln)] and construct the {m,n) matrix Q by 
setting its ith column to the m-bit binary expansion of r{i), the ith element of S. 

Theorem 2. The matrix Q of Construction 1 is UI. 

Proof: Suppose the column sums over X and J, two non-overlapping subsets of columns of Q, 
are identical, and denote these column sums by x and y, respectively, where x, y are integer vectors 
of length [log2r(n)]. This means Y.T=o' ^i^' = Y.T=q' Vi^'^^ ■ But YTJq^ ^i^' = ^^'^ 
implies Ylij(^x'^ii) ^ Zljej-^O)' ^ contradiction. ■ 

Construction 1 shows that UI query matrices of size {n — 2,n) exist for n suitably large. 

By reversing the steps of the construction, we can build a subset of the positive integers starting 
with an (m, n) binary matrix Q. However it is not necessary that the resulting set S have discrete set 
sums. This can be seen with the matrix Q shown in Example [T] The resulting set S = {1, 3, 5, 9, 15} 
clearly does not have distinct subset sums, even though Q has distinct column sums. 

A better method for using DSS sets for the Hamming oracle problem is proposed later in this 
paper. 

V. Background for Algebraic Constructions 

Material in this Section is drawn from books by Mac Williams and Sloane [fT3l , HaU [fTOll , and 
Cameron ll3]l. ifTOll. 

Definition 3 (Hall). A balanced block design (6, v, r, k, A) is an arrangement of v distinct objects 
into b blocks such that each block contains exactly k distinct objects, each object occurs in exactly 
r different blocks, and every pair of distinct objects occurs together in exactly X blocks. 

Balanced block designs are categorized as complete, which is the set C^, aU combinations of k 
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objects from a set of v objects, and incomplete, a proper subset of in which each pair of objects 
occurs an equal number of times. 

It is known that every balanced block design must satisfy the identities 

hk = vr (4) 
r{k-l) = \{v-l). (5) 

The following is known about the existence of block designs. 

Theorem 3 (Wilson [fTSlD . Given A and k, there is a vq, such that if v > vq and X{v — 1) = 

(mod {k — 1)) and \v{v — 1) = (mod k{k — 1)), then there exists a design {b,v,r, k, X) with 
r = X{v- 1)1 {k - 1) and b = Xv{v - l)/k{k - 1). 



From Wilson's theorem it follows that for v large enough there exists a A = 1 design with 

, v{v-l) 



(6) 



k{k-l) 

blocks. The incidence matrix of a block design is a matrix with b rows and v columns, the (z,j) 
entry of which is 1 if block i contains object j, and otherwise. In our constructions we will use 
subsets of such incidence matrices for block designs with A = 1. Note that rows of the incidence 
matrix of a pairwise balanced block design form a code of constant Hamming weight k, block length 
V and minimum Hamming distance dmin = 2(fc — 1). 

Of particular importance is the construction and lower bound given by Graham and Sloane (Thm. 
4, [HI ) which states that for q a prime power, q > n, there exists a constant weight code of weight 
w and minimum distance 26 with 

A{n,25,w) > -^J"") (7) 
codewords. Further, it is shown in Th. 6 of [8J that q need not be much greater than n. 

VI. A Level- 1 Algebraic Construction 
Let n = m + k with k < m. Thus m/n > 1/2. Let 

, Ih Ci E] , 

Qi = , (8) 



Im-k C*! 



tr 
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where Ik is the identity matrix of size k, the zero sub-matrix is of size (m — k,k), sub-matrix Ci 
is of size (k, m — k), Ei is of size {k, k) and denotes the transpose of Ci. For the purposes of 
this paper we define a specific version Qi{r) with Ci chosen as an ((2),'") matrix, whose rows are 
all the distinct binary r-tuples of Hamming weight 2. Thus k = (2) and m — k = r. The sub-matrix 
El is given by CiCl'' — 21 = Ei (the size of the identity matrix is clear from the context and is 
not stated explicitly) and has entries from the set {0,1} with zeros along the main diagonal. This 
follows directly from the fact that the inner product of two rows of C is in {0, 1}. 

Theorem 4. Qi{r) is UI and achieves a query ratio arbitrarily close to 1/2 for r suitably large. 

Proof: We show that Qi{r) does not have a nonzero {—1,0, l}-valued vector in its null space. 

Let 



Qi{r) 



X 

y 

V/ 




X 

y 



0, (9) 



where x, y and z are of size k, m — k and k, respectively. Thus y = —C^'z and x = (CiC*'" — Ei)z. 
But since CiC*'' — Ei = 21 the only {—1,0, l}-valued vector in Mi^Qi) is the zero vector. Thus 
Qi(r) is UI. 

The query ratio for Qi(r) is p = -^44 — - which goes to 1/2 as r grows without bound. ■ 
Thus to achieve a ratio p = 0.51 requires r > 50, or equivalently n > 2500. We can thus query 
2500 bits with 1275 queries using Qi{50). 
For later use we have the following result. 

Theorem 5. For any nonzero n-vector x with entries in { — 1,0,1}, Qi{r)x does not lie in (4Z)™. 

Proof: Let x be partitioned into sub-vectors xi, X2 and X3 of size k, r and k respectively, i.e. let 
X = {xi, X2, X3). For some integer m-vector u, let Qi(r)x = Au. Let u = (^1,^2) where ui and U2, 
of size k and m — k, respectively. Thus Xi + C1X2 + -£^1X3 = Aui and 0:2 + 6*^X3 = Au2. Simplification 
gives us 

xi = (CiCf - Ei)x3 + 4(mi - C1U2). (10) 
Since xi takes values in { — 1,0, 1} and since the first term on the right hand side is 2x3, the only 



possible solution for (10) is {ui — C1U2) = 0, Xi = and X3 = 0. But this means X2 = 4^2, the only 
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solution of which is X2 = and U2 = 0. This imphes Ui = 0. Thus x = and m = is the only 
solution for Qi{r)x = Au. ■ 

VIL An Iterated Higher-Level Construction 
Let {ma, Us) be the size of Qs{r). 

Definition 4. For s > 1 we define recursively 

V I cry 

where Cg is a binary matrix whose rows are of constant Hamming weight V, the number of rows of 
Cg is equal to the number of rows of Qs-i, and the number of columns of Cs (the block length of 
the code) is the minimum possible such that dmin the minimum Hamming distance between the rows 
of Cs is 2(2* — 1). Eg is the binary matrix given by CsC^J — 2*/ = Eg. 

We now have 

Theorem 6. For any nonzero n-vector x with entries in { — 1, 0, 1}, and any s > 0, Qs{r)x does not 
lie in (2**+^Z)'", where m is the number of rows of Qs{r). 

Proof: We have already proved the case s = L We proceed by induction. Suppose the hypothesis 
holds for Qs-i. Consider the equation Qs{r)x = 2'^+^m, where u is an integer- valued vector of 
appropriate dimension. Following the proof of Thm. [5} we have 

Qg-ixi = 2'x3 + r^\ui-CgU2), (12) 
X2 = -Cl'xs + r-^W (13) 



From (12) and the induction hypothesis, it follows that (ui — CsU2) = 0, 2:3 = and Xi = 0. Since 



X3 = 0, it follows from ( 13 ) that U2 = and X2 = 0. Since ui — CsU2 = it follows that ui = 0. 



This leads to our main result. 
Theorem 7. Qg{r) is UI for all s > 0. 
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Proof: Proof is by induction. We have already proved that Qi{r) is UI. Assume the hypothesis 
is true for Qs-i{r). Consider the equation 



which can be simplified to 



Q,{r) 



y 



(14) 



Qs-i{r)x = Tz (15) 
y = -CTz (16) 
using (11). From Thm. [6| it follows that 2 = in (15) and since Qs-i is UI, it follows that x = 0. 



From (16) it follows that y = 0. Thus the only { — 1, 0, l}-valued vector in J\f{Qs) is the zero vector. 



VIII. Decoding Rule 

For j = 0, 1, . . . , s, let level-j query matrix Qj be of size (nij, rij) and define Qq = 1^^. For ease 
of reference the recursive structure of the query matrix Qj, j > 1 is repeated here. We define Qo{r) 
to be the identity matrix of size niQ. 

(.7) 

\ / Cf) 

Note that rij = rij^i + rrij, j = 1,2, . . . , s. 

Given s and the m^-vector of non-negative integers cu^'^^ returned by the oracle, our objective is 
to solve 

Q,xW = 00^'^ (18) 

for in {0, 

It is convenient to write x^^'^ = {x^^~^\y^^~^\ z^^~^^), j = 1,2, ... ,s, where the part x^^~^^ has 
size rij^i, y^^^^^ has size rrij — rrij^i and z^^^^^ has size rrij-i. For a = rrij-i, b = rrij — rrij-i and 
j = 1, 2, . . . , s we have the following definitions: 

4'^ := (0 h), (19) 
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(20) 
(21) 

Also define u^"^ = 0, m^"*"^) := 2''z^''''^\ and for j = 1, 2, . . . , s - 1 define 

^(i-i) 2h^^-^^ + R^^\^^\ {11) 
Note that Cj and It have an equal number of columns. 




Decoding is as follows. Starting with Qsx'^^'> = w^"^ pre-multiply both sides by R^^^ apply (17) 
and the above definitions, in order to get for j = s, s — 1, . . . , 1, 

yU-l) ^C^r^^j-l) ^ Rf^ii)^ (23) 

Qj.ix^^-^^ -u^'-^^ = uj^'~'\ (24) 
Note that all entries in u^^^^'^ are divisible by 2^. We start uncovering the bits with 

a;{0)_ ^(0)^^(0) (25) 

which is solved for x*-"^ by 



X 



(0) 



{u^n, (26) 



where {x)^ is the residue of x modulo 2. Successive bits are then recovered by solving, for j = 
0,l,...,s-l, 

mO) = g^x^^) - w(^) (27) 

y(j) = (4^'+i)a;(^-+i)-C*;,^(^'))^ (29) 

^U+i) = {x^^\y^^\z^^^). (30) 

This completes the decoding process. 
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IX. Achievable Query Ratios 



Let {ms-i,ns-i) be the size of Qs-i{r) and let ric denote the number of columns of in (11) 
Note that rris^i and Ug-i grow with r. 

Theorem 8. For r suitably large, Ps(r), the query ratio for Qs{r) is arbitrarily close to + 1). 



Proof: We proceed by induction. The hypothesis has been proved for s = 1. Assume that it is 
true for Qs-i{r) and let ps_i(r) be its query ratio. From [8J we know that there exists a constant 
weight code with weight 2* and minimum Hamming distance 2(2'' — 1) with ms„i codewords provided 
Uc > a/ (2*)!ms_i and rris-i is suitably large. Thus mg = nig^i + nc = '«s-iPs-i(?") + a/ (2*)!ms_i, 
ris = ris^iil + ps-i{r)) + ^/{2')\ms-l and lim^^oo psir) = lim^^oo i+p'_%) ^ " 
Thus we can achieve an arbitrarily small query ratio, by choosing s and r sufficiently large. 



X. Example Constructions 
We construct a level- 1, level-2 and level-3 query matrices in the examples presented below. 

Example 3. Let r = 4. Thus n = 16 and m = 10. We construct Qi(4) by picking as the rows of Ci 
the six binary A-tuples of weight 2. Thus 



( iioo\ 

1010 
1001 
0110 
0101 

yooiiy 



El 



/oiiiloA 

101101 
110011 

noon 

101101 

yoiinoy 



and Qi(4) is completely specified. (5i(4) has query ratio 5/8. 

Example 4. We construct (52(9) by first constructing Qi(9) as described. (5i(9) is a (45,81) binary 
matrix. For (52(9), we selected 45 rows of the incidence matrix for the Steiner system 5'(2,4, 25) 
tabulated at Note that |S'(2,4, 25)| = 50 = ylr-. The matrix (^2(9) has size (70,151), thus 
achieving a query ratio o/ 70/151 < 1/2. A decoding rule was implemented for this design and 
error free decoding was observed in a simulation consisting of 10, 000 test vectors. 
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Example 5. We construct Qsi^d) by selecting for C3, 70 rows of the incidence matrix for the Steiner 
system S{2, 8, 64) /|5]/, which has 15(2, 8, 64) | = 72. The query ratio p is larger than in the previous 
example because the size is not large enough. Unfortunately, there are no larger published S{2, 8, *) 
designs currently available. 

We close with a graph showing the various results from the paper. The curve labeled LY is the 
existence result ([3]). The curve labeled 'Packing' is the result of Thm. [T] The curves labeled Qi : GS 
use the bound (|7]) to estimate the query matrix size, 'Qs : W uses Wilson's theorem ([6]) to estimate 
the size of C3 and the data points are for Examples [3]-[5j 

Wilson's theorem guarantees existence for suitably large designs, so this curve needs to be inter- 
preted carefully. The reason this bound was included was that it more closely matches data for small 
block designs as can be seen by how close it comes to the performance of (53(9). 

Data on larger designs is not available unfortunately. 




n: size of unknown vector 



Fig. 1. Query ratio as a function of the size of the unknown vector. Shown are various bounds, performance of Q\, Q2 and Q3 and 
the constructions of Examples |3jj5] 
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XL Conclusions 

Binary query matrices are constructed for the Hamming oracle. The construction is algebraic 
and uses previously known codes of constant Hamming weight with a specified minimum distance. 
Starting from a level- 1 construction, a sequence of query matrices is constructed by iterating a simple 
design rule. Thus a level-i query matrix is constructed using a level- (i — 1) matrix, i > 1. Our query 
matrices are shown to be uniquely identifying, i.e., it is possible to uniquely determine any unknown 
binary vector x using the query vectors in a query matrix. We also establish a connection between our 
problem and the distinct subset sum problem studied in the combinatorics literature. To be specific 
our construction makes use of the set {1,2,4, ...,2"}, which is the simplest example of a set with 
distinct subset sums. It is not clear whether the construction presented here can take advantage of 
other DSS sets presented in \2\, or whether there is a significant advantage in doing so. 
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